2
Genotype, Phenotype, and Environment
15
code for the same protein (functionally speaking) in different organisms, in order
to deduce phylogenetic relationships. A further branch of this territory compares
the sequences of healthy and diseased organisms, in an attempt to assign genetic
causes to disease. The second territory attempts to find genes (and, ultimately, other
functionally important sequences such as those involved in regulation) via linguistic
inhomogeneities and to assign function to the genes by searching for regularities
(the “grammar” of the sequence). In its purest form, genomics could be viewed
simply as the study of the nonrandomness of DNA sequences. This endeavour is
still inchoate, since the regularities and their relation to function are not understood.
One may, however, be able to predict the structure from the sequence, which can
then be used to advance the search for function. Even coarse indications may be
useful; for example, transmembrane proteins typically possess several alphaα-helices,
traversing the lipid bilayer, with characteristically hydrophobic amino acids. The
term “structural genomics” denotes the assignment of structure to a gene product by
any means available; “functional genomics” refers to the assignment of function to
a gene product.
Proteomics focuses on gene products (i.e., proteins). The primary task is to cor-
relate the pattern of gene expression with the state of the organism. For any given
(eukaryotic) cell, typically only 10% of the genes are actually translated into proteins
under a given set of conditions and at a particular epoch in the cell’s life. On the other
hand, a given gene sequence can give rise to tens of different proteins, by varying
the arrangements of the exons (Sect. 14.8.5) and by posttranslational modification.
Insofar as proteins are the primary vehicle of phenotype, proteomics constitutes a
communication channel between genotype and phenotype. One may think of the
proteome as the “vocabulary” of the genome: Just as we use words to convey ideas
and build up our individual characters, so is the genome helpless without proteins.
Clearly, the proteome forms the molecular core of epigenetics. Once expression data
are available, work can start on their analysis. Via the proteome, genetic regulatory
networks can be elucidated.
The raw data of proteomics is either the transcriptome—a list of all the transcribed
mRNAs and their abundances at a particular epoch—or the proteome—a list of all
the translated proteins and their abundances, or net rates of synthesis, at a particular
epoch. Given the processing that takes place between transcript and protein, it is not
surprising that there are often large differences between the transcriptome and pro-
teome. Experimentally, the compiling of such a list involves separating the proteins
from one another and then identifying them.
Comparison between the proteomes of diseased and healthy organisms forms the
foundation of the molecular diagnosis of disease. This is just one of many applications
of bioinformatics to medicine (see Part V).
The investigation of protein products is called metabolomics; the metabolome
comprises all of the molecules in the cell apart from proteins and DNA (lipids and
polysaccharides are also usually excluded), and metabolomics is firstly concerned
with their identification, abundances and localization, and then, with this informa-
tion, with how it is all regulated, especially to keep the “essential variables” of the
organism within the limits compatible with survival. This regulation also comprises